GeneCrunch and Europort

نویسندگان

  • Reinhard Schneider
  • Michael Schlenkrich
چکیده

The SGI POWER CHALLENGEarray TM represents a hierarchical supercomputer because it combines distributed and shared memory technology. We present two projects, Europort and GeneCrunch, that took advantage of such a configuration. In Europort we performed scalability demonstrations up to 64 processors with applications relevant to the chemical and pharmaceutical industries. GeneCrunch, a project in bioinformatics, performed an analysis of the whole yeast genome using the software system GeneQuiz. This project showcased the future demands of HPC in pharmaceutical industries in tackling analysis of fast growing volumes of sequence information. GeneQuiz, an automated software system for large-scale genome analysis developed at the EMBL /EBI , aims at predicting the function of new genes by using an automated, rigorous, rule-based system to process the results of sequence analysis and database searches to build databases of annotations and predictions. In GeneCrunch more than 6,000 proteins from baker's yeast, for which the complete genomic sequence was completed in 1996, were analyzed on a SGI® POWER CHALLENGEarray with 64 processors (R8000® at 90MHz) in three days rather than the seven months predicted for a normal workstation. 1. Hierarchical Parallel Supercomputing Figure 1: Schematic representation of distributed and shared memory systems. Arrows mark the memory system for which data coherency has to be guaranteed. The small unlabeled boxes represent the caches. On the SGI shared memory system the data coherency is achieved in hardware, while normally cluster solutions rely on a software. The design of parallel supercomputers can be separated into two classes: the distributed memory and the shared memory system (Figure 1). The essential differentiators are the location and access methods of the memory and the data coherency model. In a distributed memory machine each process element (PE) has its own memory subsystem, coherency of the data is implemented in software, the shared memory concept provides a unified memory for all the processors, and data coherency is maintained in hardware. The distributed memory systems can be subdivided into two classes in which the first class is the cluster of individual workstations. In this design, access to the memory of a neighboring PE has to involve the CPU of the PE, which results in the push-and-pull mechanism. This concept is very latency sensitive, since two processors have to be synchronized. Allowing high bandwidth requires a huge investment in the network capabilities of each individual PE, which normally prohibits a high bandwidth between the PE. Data coherency among the PEs has to be in software, which requires an explicit message passing within the parallel application. However, the great advantage of this design is the expandability, since there is no real limit to the number of PEs. The second class of distributed memory systems has a more sophisticated memory access network that bypasses the processor of the PE on which the memory access occurs. This greatly reduces the latency using a more complicated and therefore more expensive coupling of PE. Normally such designs do not include any cache-coherency protocols among the processors. Therefore, applications have to explicitly ensure data coherency. The shared memory concept provides a global memory to all processors, and the coherency of the data in the caches is maintained in hardware. This enables the used of shared memory parallel programming, which does not require the explicit programming of the data flow among the processors. This concept does not rule out parallel programming using message-passing libraries, since these libraries can be optimized to make use of the shared memory concept to allow very low latency and high bandwidth among the individual threads. The downside of a shared memory architecture is the expandability, which is currently limited to 36 processors on the POWER CHALLENGE TM product line. Figure 2: Schematic representation of the POWER CHALLENGEarray. The POWER CHALLENGEarray combines both concepts to exploit the best of both worlds. Figure 2 shows a schematic representation of the POWER CHALLENGEarray with the individual shared memory nodes coupled with a network. Since the number of individual nodes is low, the relative cost of the network is small. This allows the usage of a highperformance network such as HIPPI. While in principle there is no limit to the number of shared memory nodes, a feasible configuration can have up to eight nodes, resulting in a maximum of 288 processors. The data coherency is maintained in hardware within a node, while a distributed memory model holds among the nodes and software (message passing) is required to maintain data coherency. There are two programming models that can be used on the POWER CHALLENGEarray,shared memory and message passing. Shared memory parallelization can use up to 36 processors within one shared memory node. Message-passing libraries have to be used if more than 36 processors are used. Currently two message-passing protocols, MPI and PVM, are supported on the POWER CHALLENGEarray. These libraries utilize the shared memory hardware if messages are exchanged among threads residing in one shared memory node. If messages are exchanged among threads residing on different nodes, PVM and MPI switch to a socket mechanism that is optimized for the HIPPI hardware. An interesting approach is to use both programming paradigms: having shared memory parallel threads running on the shared memory nodes talking to other threads on different nodes via message passing. Hereby we could exploit both the coarse and the fine grained parallelism in applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Industrial simulation on parallel computers

Parallel computers have demonstrated their principle suitability for numerical simulation during the eighties and early nineties. In particular, they were able to provide a cost-e€ective means of achieving high performance computing (HPC) power. Even so, there was only a limited impact of this technology on industrial computing. In order to foster the take-up of this technology by industrial us...

متن کامل

Parallelisation of the multibody simulation package FEDEM

In this work we present some results from the EUROPORT project EUROPENGER{FEDEM. We give a brief presentation of the FEDEM code (Finite Element Dynamics in Elastic Mechanisms), which is a multidisciplinary, multibody simulation package for mechanical systems , and the preliminary results obtained in the parallelization of the system. We have emphasized the data distribution strategies and probl...

متن کامل

Numerical Simulation in Automotive Design

Focussing on those areas of simulation which dominate the large-scale computational requirements within the automotive industry, the initial aim of the paper is to give an insight into typical applications and algorithms being used. Beyond this, the main emphasis will be on the more recent developments related to the take-up of HPC technology in the form of the distributed-memory, message-passi...

متن کامل

Parallelization of the GROMOS87 Molecular Dynamics Code: An Update

In this paper, we describe the performance of the parallel GROMOS87 code, developed under the ESPRIT EUROPORT–2/PACC project, and indicate its potential impact in industry. An outline of the parallel code structure is given, followed by a discussion of the results of some industrially–relevant testcases. Conclusions are drawn as to the overall success of the project, and lessons learned from th...

متن کامل

Multilevel Parallel Solution of Large, Sparse Finite Element Equations from Structural Analysis

1 Objectives We discuss the use of high performance computing equipment in large, commercial structural analysis programs. In particular we consider the strategy for modifying a standard industrial code from a pure F77 version to a form suitable for a range of parallel computers 4. The code is parallelized using the PVM message passing library for communication between processes and the ScaLA-P...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013